Semi-Markov models of mRNA-translation 
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Translation is the cellular process in which ribosomes make proteins from information encoded 
on messenger RNA (mRNA). We model translation with an exclusion process taking into account 
the experimentally determined, non-exponential, waiting time between steps of a ribosome. From 
numerical simulations using realistic parameter values, we determine the distribution P(E) of the 
number of proteins E produced by one mRNA. We find that for small E this distribution is not 
geometric. We present a simplified and analytically solvable semi-Markov model that relates P(E) 
to the distributions of the times to produce the first E proteins. 
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Biological cells respond to external or internal sig- 
nals by producing proteins. According to the central 
dogma of molecular biology pQ, the production of pro- 
teins from genes occurs in two steps. In the first one, 
called transcription, information encoded in a gene is 
read by an RNA-polymerase and used to synthesize a 
messenger RNA (mRNA) molecule. In the second step, 
named translation, a ribosome moves along the mRNA 
and uses the information stored on its codons to make a 
new protein pQ . 

Cellular processes are inherently stochastic [2] because 
the various molecules involved occur in small numbers 
[3J. In this Letter, we will focus on stochastic aspects of 
the translation process. We also take into account that 
mRNA is an unstable molecule which, through the action 
of RNase, decays with a rate A. For example, the mRNA 
that produces the protein tsr- Venus in E. Coli has an 
average lifetime 1 / A = 90 seconds |H |3] . 

A simple theory for stochastic protein production was 
proposed more than thirty years ago by O. Berg [5J. 
Assuming that proteins are produced with a rate kt, 
he showed that the number of proteins produced by 
one mRNA follows a geometric distribution with aver- 
age k t /A. Recently, it has become possible to measure 
this distribution experimentally [5, 7 and fair agreement 
with Berg's theory was found. Because of its simplic- 
ity, the geometric distribution is also used in theoretical 
models of stochastic gene expression [HIE]. Yet, Berg's 
theory cannot be the complete story and in this Letter 
we examine the robustness of his results to the addition 
of a few more realistic ingredients. 

Each mRNA molecule has two different ends labelled 
as 3' and 5'. A ribosome first attaches to the 3' end 
(initiation) after which it moves forward codon by codon 
towards the 5' end (elongation). In each step, an amino 
acid is added to the growing protein. Upon reaching 
the 5' end, the ribosome detaches (termination) and the 
newly synthesized protein is released. Each of the pro- 
cesses of initiation, elongation and termination involves 
several biochemical steps [TU] . These individual steps can 



be considered to be memoryless (Markov) and therefore 
to have an exponential waiting time distribution. But on 
a more coarse grained level, quantities like the waiting 
time between two elongation steps generically have more 
complicated distributions. Indeed, recent single molecule 
experiments have shown that this so called dwell time can 
be fitted well by a gamma-distribution or by a difference 
of exponentials [IT] . 

Berg's theory also does not take into account that at 
a particular moment, several ribosomes are attached to 
a mRNA. A given ribosome can then only move forward 
if there is sufficient free space in front of it. This leads 
to delays and is an extra source of stochasticity [H] . A 
well known model that describes these effects [13] is the 
totally asymmetric exclusion process (TASEP) with ex- 
tended objects [H[T1]. In the TASEP, mRNA is repre- 
sented as a one-dimensional lattice of L sites, each site 
corresponding with one codon, that can be occupied by at 
most one ribosome. Ribosomes are large in comparison 
with a codon and are therefore modelled as an extended 
'particle' that covers I lattice sites. Following [2], we 
will take I = 12. In this Letter, we study a version of 
the TASEP with extended particles where the dwell time 
r has a waiting time probability density (WTPD) ip(r) 
that is gamma-distributed 

_n— 1 jLn 

^(r) = — — exp[-fcr] (1) 
T(n) 

Stochastic processes of this type with non-exponential 
waiting times are called semi-Markov in the mathemat- 
ical literature. In our model, ribosomes attach to the 
3'-end of mRNA with rate a (provided the first 12 sites 
are empty), leave the 5'-end with rate (3, and the whole 
translation process stops with rate A due to degradation 
of mRNA. These latter three process are still treated as 
Markovian. Fig. 1 shows the ingredients of our model. 

The average waiting time for the gamma-distribution 
([I]) is n/k, so that the ribosome on average needs a time 
k/n to make one elongation step. We will take this as 
the unit of time in our simulations, so that we can put 
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FIG. 1: (Color online) Semi-Markov lattice model of trans- 
lation. mRNA is modelled as a lattice of L sites, each site 
representing one codon. Translation is initiated with rate a 
and terminated with rate /3 while mRNA degrades with rate 
A. Elongation proceeds with a waiting time probability den- 
sity V(t). 



k = n in (JTj). For E. Coli, this corresponds in good 
approximation to 1/15 seconds [3]. For the initiation and 
termination rate, we take 3 respectively 1/2 seconds, as 
determined experimentally [15] . In the chosen time unit, 
we therefore have a = 1/45 and (3 — 2/15. Finally, we 
ran most of our simulation for A = 1/1350 and L = 815, 
the values appropriate for the tsr- Venus protein. 

In order to simulate a semi-Markovian TASEP, we used 
a modified continuous time Monte Carlo method. In our 
algorithm, every ribosome has a clock associated to it, 
which states the remaining waiting time until the next 
attempt at movement. Once a clock reaches zero, the 
attempt is made, and the clock is (re)set. The waiting 
time distribution for each clock depends on the site the 
particle is currently located at: for the first site and last 
site, the waiting time is exponentially distributed with 
parameter a respectively /3, while for sites in the bulk, we 
sample the waiting time from ip( T ) as given in ([I]). At the 
beginning of the simulation of a given realisation, we also 
draw a random time r from an exponential distribution 
Ae _At . Once the time of the simulation exceeds To, that 
history is stopped. 

In the simulations, we start at t = with an empty 
lattice, and then initiate translation. We monitor the 
current j r (t) of ribosomes leaving the mRNA at the 5'- 
cnd. In the absence of decay, the total number of proteins 
produced up to time t, E{t), equals the time-integrated 
ribosome current E(t) = J* j r (t')alt' . After some time, 
we expect our model to reach a non-equilibrium steady 
state (NESS) in which quantities like the average (j r {t)) 
and the variance Aj r (t) of the current become time- 
independent. In Fig. 2, results for these quantities 
are shown. We also have indicated with a vertical line 
t = 5/ A, a time at which almost all mRNA will have 
decayed. We see that the average and the variance of 
the number of proteins produced per time unit increases 
very rapidly before reaching a steady state value. We 
also notice that this time is much greater than the av- 
erage lifetime 1/A of mRNA. Ribosomes therefore seem 
to work in an early time regime where the current and 



its fluctuations are rather small. This may be a mech- 
anism to reduce fluctuations in the number of proteins 
produced. 
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FIG. 2: (Color online) Average and variance (inset) of the 
ribosome current leaving mRNA as a function of time, in the 
absence of mRNA decay. The vertical line is at t = 5/A. Data 
= 0.5 and are averaged over 10 3 histories. 
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We now return to the model with mRNA-decay. From 
our simulations we have determined the full distribution 
P(E) of the number E of proteins produced. Our results 
are shown in a semi-log plot in Fig. 3. We see, that for 
E > 4, the distribution is still geometric but that there 
is a deviation at the smallest E values. In Fig. 3 (inset) 
we have plotted the experimental data for tsr- Venus [5] 
in a similar way. Notice that while these are consistent 
with a geometric distribution, they also show an excess 
of cases where only a few proteins are produced. This 
could be an experimental signature of the effect seen in 
our model. It is interesting to remark that for larger val- 
ues of the initiation and/or termination rate, the range 
over which the data deviate from a geometrical distribu- 
tion becomes larger. Given that our values of a and /3 
are rather rough estimates, and that they may vary from 
protein to protein, it is quite possible that in other realis- 
tic situations a larger deviation from geometric behaviour 
can be observed. 

From our data, we have calculated AE/(E) 2 (where 
AE and (E) are the variance and average of the number 
of proteins produced), a common noise measurement in 
the experimental literature on stochastic gene expression. 
This quantity was found to depend only weakly on k. An 
average over a range of k- values gives AE/(E) 2 = .98 ± 
0.02 to be compared with the experimental data in [5], 
from which one finds AE/(E) 2 — .74. Given the relative 
simplicity of our model and the fact that no parameters 
were fitted we judge the agreement to be quite satisfying. 
Adding extra detail to our model, like fast and slow bonds 
[15] . might improve the agreement. 

Finally, we calculated the waiting time te between the 
production of the (E— l)-th and E-th protein, given that 
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FIG. 3: (Color online) Semi-log plot of the distribution 
P(E) of the number of proteins produced as calculated us- 
ing our model (for k = 0.5) and from experiments (inset). 
The straight lines are best fits to a geometric distribution. 



can be achieved by replacing Berg's reaction scheme for 
mRNA (Fig. 5, leftside) with a semi-Markov one (Fig. 5, 
rightside) in which mRNA decays with WTPD i/jq (t) and 
where the E-th protein is produced with WTPD 4>e{t). 
Conservation of probability implies 



[^(r)+^(r)]dr=l 



(2) 



We denote by pe — / °° iPe(t)cIt, the probability that 
the -Eth protein is made before the mRNA has decayed. 



mRNA 




mRNA 




mRNA did not decay. Fig. 4 shows the probability den- 
sities 4>e of these waiting times for E = 1 (inset) and 
E = 2, • • • , 6. It takes a relatively long time to produce 
the first protein, but then protein production is on aver- 
age periodic [T^]. A more detailed look shows that the 
distributions become independent of E from E = 4 on- 
wards. Moreover, they are clearly non-exponential. We 
were not able to fit these WTPD's with a simple proba- 
bility density like that of the gamma-distribution. 




FIG. 5: (Color online) Berg's standard model for protein 
production (left) and our semi-Markov extension (right). 

Within this effective model, it is possible to calculate 
the distribution of the number of proteins produced ex- 
actly. We outline this derivation here. The state space 
of our semi-Markov model consists of the empty state 
representing a decayed mRNA, and the states 0, 1, 2, • • • 
giving the number E of proteins produced. We assume 
that at t = 0, the process starts at E = 0. In that case 
the probability R(E, t) that the system is in the state 
E at time t evolves according to the generalised master 
equation [III IPS] 



dR(E,t) 
Jt 



/ K E {t-r)R{E -l,r)dr 
Jo 



500 



-/ [K E+1 (t-T) + Ku(t-T)]R(E,T)dT (3) 
Jo 

where the memory kernels Ke and Kq are defined in 
([5]) below and R(—l,t) — 0. To solve this set of equa- 
tions, we first do a Laplace transformations R(E, s) = 
Io° e ~ st R(E, t)dt. Inserting the initial condition, (J3j) be- 
comes 



FIG. 4: (Color online) Probability density iI>e of the wait- 
ing time between the production of the (E — l)-th and E-ih 
protein for E = 1, • • • ,6. Data are for k = n — 0.5 and are 
averaged over 5 x 10 6 histories. 



One of the interesting features of Berg's original model 
is its simplicity, which makes it very easy to use in cal- 
culations of the distribution of the number of proteins 
in a cell |9] . Our lattice model is too complicated for 
this. We would therefore like to replace it by a simpler 
effective model in the spirit of Berg, yet sharing most of 
the properties of the full model shown in Fig. 1. This 



sR{E, s) - 5 E ,o = K E (s)R(E - 1, s) 
- \K E+ i( S )+Kv(s)] R(E,s) 



(4) 



where the Laplace transforms of the kernels can be re- 
lated to those of the waiting time distributions [15] 



K E (s) 



&e(s) 



and 



<Pe(s) = ~ [1 - ^e( s ) - i>i(s)] 



(5) 



(G) 
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is the Laplace transform of 4>e(t) = 1 — 
Jo'IV'bW+^dW] tne probability that no tran- 
sition from the state E — 1 has been made up to time 
t. Since R{— 1, s) = 0, the set of equations Q can be 
solved recursively giving 



R{E,s) = 5 E ,o4>i{s) + 



(7) 



We are interested in the probability P(E) that E pro- 
teins are produced before the mRNA decays. This proba- 
bility equals the probability that E proteins are produced 
up to time t' times the conditional probability that the 
mRNA decays in the time interval (if, if + d£) given that 
it didn't decay before, summed over all if 



P{E) 



R{E,t ) 



dt' 



(8) 



We now assume, as in the lattice model, that mRNA- 
decay can still be described as a Markovian process with 
rate A. In that case, (fsl) simplifies considerably and be- 



comes 



P(E) = A / R(E, t')dt' = XR(E, 0) 



(9) 



so that P(E) can be obtained from iterating ^ at s = 0. 
The iteration involves ?A_e(0) = Pe arid <^_b(0) which using 
^ and ^ becomes 



<Mo) = -^(o)-v (o) = <r B _ 



(10) 



where (te—i) is the average time that the model stays 
in the state E — 1. Putting everything together we find 
finally 



P(0) = A(r > 
P(E>0) = \ (f[pi) (te) 



(11) 



\i=l 



Suppose now, that as was observed in our lattice model, 
iPe(j) becomes independent of E for E > Eq. We then 
find for E > E„ 



P(E) 



'Eo-l 



M n 



Pi 

PEo 



K T E a ) 



(PEoY 



(12) 



which is precisely a geometric distribution. In this way 
we understand that the observed deviations from the 
geometric distribution are related to the -E-dependence 
of the WTPD i/je(t). The simplified model therefore 
gives an explanation of the behaviour found in the lat- 
tice model. 

In summary, we have studied the influence of mem- 
ory effects on mRNA-translation. Taking as an input 
the experimentally determined waiting time distribution 



between two steps of a ribosome, we determine the dis- 
tribution of the number of proteins produced, another 
experimentally accesible quantity. This distribution was 
found to deviate from a geometrical one. We could ex- 
plain this behavior in terms of the properties of waiting 
time probabilities. 

The deviations from geometrical behavior become 
more important when the initiation and/or termination 
rates become larger than those found in living systems. 
Recent progress has made it possible to study protein 
synthesis in a cell- free system [TU]. It can be envisaged 
that such an approach can test the predictions following 
from our model by changing, for example initiation rates. 
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